Using Mechanical Turk to Annotate Lexicons for Less Commonly Used Languages

نویسندگان

  • Ann Irvine
  • Alexandre Klementiev
چکیده

In this work we present results from using Amazon’s Mechanical Turk (MTurk) to annotate translation lexicons between English and a large set of less commonly used languages. We generate candidate translations for 100 English words in each of 42 foreign languages using Wikipedia and a lexicon induction framework. We evaluate the MTurk annotations by using positive and negative control candidate translations. Additionally, we evaluate the annotations by adding pairs to our seed dictionaries, providing a feedback loop into the induction system. MTurk workers are more successful in annotating some languages than others and are not evenly distributed around the world or among the world’s languages. However, in general, we find that MTurk is a valuable resource for gathering cheap and simple annotations for most of the languages that we explored, and these annotations provide useful feedback in building a larger, more accurate lexicon.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Creating and Validating Multilingual Semantic Representations for Six Languages: Expert versus Non-Expert Crowds

Creating high-quality wide-coverage multilingual semantic lexicons to support knowledge-based approaches is a challenging time-consuming manual task. This has traditionally been performed by linguistic experts: a slow and expensive process. We present an experiment in which we adapt and evaluate crowdsourcing methods employing native speakers to generate a list of coarse-grained senses under a ...

متن کامل

Building Subjectivity Lexicon(s) from Scratch for Essay Data

While there are a number of subjectivity lexicons available for research purposes, none can be used commercially. We describe the process of constructing subjectivity lexicon(s) for recognizing sentiment polarity in essays written by test-takers, to be used within a commercial essay-scoring system. We discuss ways of expanding a manually-built seed lexicon using dictionary-based, distributional...

متن کامل

Using the Amazon Mechanical Turk to Transcribe and Annotate Meeting Speech for Extractive Summarization

Due to its complexity, meeting speech provides a challenge for both transcription and annotation. While Amazon’s Mechanical Turk (MTurk) has been shown to produce good results for some types of speech, its suitability for transcription and annotation of spontaneous speech has not been established. We find that MTurk can be used to produce highquality transcription and describe two techniques fo...

متن کامل

Last Words: Amazon Mechanical Turk: Gold Mine or Coal Mine?

Recently heard at a tutorial in our field: “It cost me less than one hundred bucks to annotate this using Amazon Mechanical Turk!” Assertions like this are increasingly common, but we believe they should not be stated so proudly; they ignore the ethical consequences of using MTurk (Amazon Mechanical Turk) as a source of labor. Manually annotating corpora or manually developing any other linguis...

متن کامل

Amazon Mechanical Turk: Gold Mine or Coal Mine?

Recently heard at a tutorial in our field: “It cost me less than one hundred bucks to annotate this using Amazon Mechanical Turk!” Assertions like this are increasingly common, but we believe they should not be stated so proudly; they ignore the ethical consequences of using MTurk (Amazon Mechanical Turk) as a source of labor. Manually annotating corpora or manually developing any other linguis...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2010